Skip to content

Conversation

@s3inlc
Copy link
Member

@s3inlc s3inlc commented Jun 20, 2024

This PR solves some of the issues with loading/handling of systems with large amount of tasks or larger number of agents:

  • When creating chunks, so far there was a global lock file in place which was locked no matter for which task a chunk was about to be created. It is enough to lock by each task, as creating two chunks for two different tasks at the same time poses no problem.
  • When having active tasks with very large amount of chunks, the chunk assignment on the server may take so long, that the agent thinks the connection has timed out (30s). This can lead to situations where all agents end up in the loop of requesting chunks/tasks and the server is not able to respond fast enough anymore due to an active task it has to check where it looped over all chunks (which obviously took time the larger the number of chunks). This is solved by the changes in TaskUtils where instead of looping over all existing chunks, it is only looping over the non-completed ones and uses SUM in sql to determine the other needed values.
  • In general there are a lot of places in the code, where multiple queries are done summing/counting/maxing over columns (or even worse, in most of the places, the entries are loaded and looped over in the code). When having a lot of chunks, the biggest issue happened in the getTaskInfo() function, which is used by the current old UI, causing very long loading times of the tasks page if tasks with many chunks were listed. In order to tackle this, the DBA was extended slightly to be able to sum/count/max/.. over multiple columns with the same query (as long as the where conditions were the same for all them). In the specific case of the getTaskInfo() function, this reduced the loading times by 5-6 times approximately.

The DBA function change was already adapted for the getChunkInfo() function as well, but there may be many places still around where potentially using either a aggregation function alone or using the newly added multicolumn aggregation could give potential additional speedups.

zyronix and others added 3 commits April 22, 2024 10:48
Adding loops to scan through lines to support importing hashes longer then 1024 bytes
@s3inlc s3inlc requested a review from zyronix June 20, 2024 12:08
@s3inlc s3inlc changed the base branch from master to dev September 27, 2024 13:46
@s3inlc s3inlc requested review from jessevz and removed request for zyronix September 27, 2024 13:47
@jessevz jessevz self-assigned this Oct 14, 2024
@s3inlc s3inlc requested a review from jessevz November 20, 2024 10:41
@jessevz jessevz merged commit fe16f39 into dev Nov 21, 2024
1 check passed
@s3inlc s3inlc deleted the aggregation-optimization branch July 31, 2025 06:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

No open projects
Status: 🎉 Done
Status: 🎉 Done

Development

Successfully merging this pull request may close these issues.

4 participants